This notebooks will help me talk through a bit of material regarding spatial data, with some introduction to spatial joins.

Key topics for today:

Packages

Standards:

library(knitr)
library(tidyverse)
library(janitor)
library(lubridate) # because we will probably see some dates
library(here) # a package I haven't taught you about before that doesn't do much, but ....
library(sf)
Linking to GEOS 3.10.2, GDAL 3.4.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(tmap)
library(tidycensus)

Some additional packages focused on today’s work:

library(sf) # working with simple features - geospatial
library(tmap)
library(tidycensus)

Informational resources

Using the Neighborhood Geospatial Data (using /data)

Our first data source comes from opendata.dc

https://opendata.dc.gov/datasets/DCGIS::dc-health-planning-neighborhoods/about

I will use the GeoJSON file. (Newer, not necessarily better, but … a single file. Not smaller, but … this one is not big.)

Data is easily readable

neigh=st_read(here("DC_Health_Planning_Neighborhoods.geojson")) %>% clean_names()
Reading layer `DC_Health_Planning_Neighborhoods' from data source 
  `/Users/sarahkohls/Desktop/College Classes/Intro to Data Science/DS241Portfolio/DC_Health_Planning_Neighborhoods.geojson' 
  using driver `GeoJSON'
Simple feature collection with 51 features and 8 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -77.11976 ymin: 38.79165 xmax: -76.9094 ymax: 38.99556
Geodetic CRS:  WGS 84
class(neigh)
[1] "sf"         "data.frame"
plot(neigh)

Reminder - Joins

df1=tibble(fruit=c("apple","banana","cherry"),cost=c(1.5,1.2,2.25))
df2=tibble(fruit=c("apple","apple","cherry","lemon"),
           desert=c("pie","cobbler","cobbler","cheesecake"),
           cal=c(400,430,500,550))
df1
df2
left_join(df1,df2,by="fruit")

Investigating joining spatial and non-spatial data

Covid case information is available from opendatadc:

https://opendata.dc.gov/datasets/DCGIS::dc-covid-19-total-positive-cases-by-neighborhood/about

Read cases information:

df_c=read_csv(here("DC_COVID-19_Total_Positive_Cases_by_Neighborhood.csv")) %>% clean_names() 

df_cases=df_c %>%
  filter(as_date(date_reported) == "2022-02-22") %>% 
  separate(neighborhood,into=c("code","name"),sep = ":") %>%
  mutate(code=case_when(
    code=="N35" ~"N0",
    TRUE ~ code
  )) %>%
  select(-objectid,-date_reported)

Regular joining (of dataframes)

neigh2=left_join(neigh,df_cases,by=c("code")) 

tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(neigh2) +tm_polygons("total_positives",alpha=.5)

Joining with other spatial data

Let’s get some data using tidycensus. Need an API key https://api.census.gov/data/key_signup.html



census_api_key("00b78ce52463bf386a260d23ec58edb622e6d3ac")
To install your API key for use in future sessions, run this function with `install = TRUE`.
#what variables
v20 = load_variables(2018,"acs5")
# median_family_income="    B06011_001" 
# all "B00001_001"  
#black "B02009_001"

Get some data:

df_cencus=get_acs(geography = "tract",
                  variables=c("median_inc"="B06011_001",
                              "pop"="B01001_001",
                              "pop_black"="B02009_001"),
                  state="DC",geometry=TRUE,year=2018) 
Getting data from the 2014-2018 5-year ACS
Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
Using FIPS code '11' for state 'DC'

  |                                                                                             
  |                                                                                       |   0%
  |                                                                                             
  |===============================                                                        |  35%
  |                                                                                             
  |===========================================                                            |  50%
  |                                                                                             
  |===================================================                                    |  58%
  |                                                                                             
  |============================================================                           |  69%
  |                                                                                             
  |=====================================================================                  |  80%
  |                                                                                             
  |=======================================================================================| 100%
class(df_cencus)
[1] "sf"         "data.frame"
plot(df_cencus)

It’s in long format. Let’s make it wide.

df_cens=df_cencus %>% select(-moe) %>% spread(variable,estimate) 

tm_shape(df_cens) +tm_polygons("median_inc",alpha=.5)

  tm_shape(neigh2) +tm_borders(col="blue",lwd=5,alpha=.2)+
  tm_shape(df_cens) +tm_borders(col="red",lwd=1,alpha=.3)
#<<<<<<< HEAD
#df_j=st_join(df_cens,neigh2)
#=======
#df_j=st_join(df_cens,neigh2,prepared=FALSE)
#>>>>>>> aaf01be5cf721819dd2df615aef7a1999bcec0c2
df_cens_adj=df_cens %>% st_transform(4326)
df_j=st_join(df_cens_adj,neigh2,largest=TRUE)
Warning: attribute variables are assumed to be spatially constant throughout all geometries

Other order?:

#<<<<<<< HEAD
##df_j_rev = st_join(neigh2,df_cens_adj,largest=TRUE)
#=======
#df_j_rev = st_join(neigh2,df_cens_adj,largest=TRUE)
#>>>>>>> aaf01be5cf721819dd2df615aef7a1999bcec0c2

Since we want the geometry for the NEIGHBORHOODS, we need a to work a little harder:

df1=df_j %>% select(median_inc,pop,pop_black,objectid) %>%
  group_by(objectid) %>%
  summarise(pop_n=sum(pop),
            pop_black_n=sum(pop_black), 
            adj_median_income=sum(pop*median_inc)/pop_n) 

plot(df1)

#df2=left_join(neigh2,df1)

df2=left_join(neigh2,df1 %>% st_set_geometry(NULL))
Joining, by = "objectid"
df2=df2 %>% mutate(black_perc=pop_black_n/pop_n, covid_rate=total_positives/pop_n)
tm_shape(df2)+tm_polygons(c("adj_median_income","covid_rate","black_perc"))
df2 %>% filter(objectid!=30) %>% tm_shape()+tm_polygons(c("adj_median_income","covid_rate","black_perc"),alpha=.4)
LS0tCnRpdGxlOiAiUmVwbGFjZW1lbnQgQ2xhc3MgLSBDbGVhbmluZyB1cCBhbmQgU3BhdGlhbCBKb2lucyIKZGF0ZTogICIyMDIyLTExLTA5IgphdXRob3I6ICJTYXJhaCBLb2hscyIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKVGhpcyBub3RlYm9va3Mgd2lsbCBoZWxwIG1lIHRhbGsgdGhyb3VnaCBhIGJpdCBvZiBtYXRlcmlhbCByZWdhcmRpbmcgKnNwYXRpYWwgZGF0YSosIAp3aXRoIHNvbWUgaW50cm9kdWN0aW9uIHRvIGBzcGF0aWFsIGpvaW5zYC4KCktleSB0b3BpY3MgZm9yIHRvZGF5OgoKKiBVc2luZyBgc2ZgIHBhY2thZ2UKKiBVc2luZyBgdG1hcGAgcGFja2FnZQoqIFVzaW5nIGB0aWR5Y2Vuc3VzYCBwYWNrYWdlCiogUmVtaW5kZXIgb24gam9pbnMgKGZvY3VzOiBsZWZ0IGpvaW4pCiogT3VyIFNwYXRpYWwgRGF0YQogICAqIE5laWdoYm9yaG9vZHMKICAgKiBKb2luaW5nIHdpdGggbm9uLXNwYXRpYWwgZGF0YQogICAqIENlbnN1cyBkYXRhCiAgICogSm9pbmluZyB3aXRoIHNwYXRpYWwgZGF0YQoqIGlnbm9yZSBodG1sIGluIGdpdCAgKGlmIHRpbWUpCgojIyBQYWNrYWdlcwoKU3RhbmRhcmRzOgoKYGBge3J9CmxpYnJhcnkoa25pdHIpCmxpYnJhcnkodGlkeXZlcnNlKQpsaWJyYXJ5KGphbml0b3IpCmxpYnJhcnkobHVicmlkYXRlKSAjIGJlY2F1c2Ugd2Ugd2lsbCBwcm9iYWJseSBzZWUgc29tZSBkYXRlcwpsaWJyYXJ5KGhlcmUpICMgYSBwYWNrYWdlIEkgaGF2ZW4ndCB0YXVnaHQgeW91IGFib3V0IGJlZm9yZSB0aGF0IGRvZXNuJ3QgZG8gbXVjaCwgYnV0IC4uLi4KbGlicmFyeShzZikKbGlicmFyeSh0bWFwKQpsaWJyYXJ5KHRpZHljZW5zdXMpCmBgYAoKU29tZSBhZGRpdGlvbmFsIHBhY2thZ2VzIGZvY3VzZWQgb24gdG9kYXkncyB3b3JrOgoKYGBge3J9CmxpYnJhcnkoc2YpICMgd29ya2luZyB3aXRoIHNpbXBsZSBmZWF0dXJlcyAtIGdlb3NwYXRpYWwKbGlicmFyeSh0bWFwKQpsaWJyYXJ5KHRpZHljZW5zdXMpCgpgYGAKIyMgSW5mb3JtYXRpb25hbCByZXNvdXJjZXMKCiogQW4gb3ZlcmFsbCByZXNvdXJjZSBvbiBtYXBwaW5nIGluIFI6IGh0dHBzOi8vYm9va2Rvd24ub3JnL25pY29oYWhuL21ha2luZ19tYXBzX3dpdGhfcjUvZG9jcy9pbnRyb2R1Y3Rpb24uaHRtbAoqIEEgc3RhcnRpbmcgcG9pbnQgdG8gbGVhcm4gYWJvdXQgYHNmYDogIGh0dHBzOi8vci1zcGF0aWFsLmdpdGh1Yi5pby9zZi9hcnRpY2xlcy8KKiBHZXR0aW5nIHN0YXJ0ZWQgd2l0aCBgdG1hcGA6IGh0dHBzOi8vY3Jhbi5yLXByb2plY3Qub3JnL3dlYi9wYWNrYWdlcy90bWFwL3ZpZ25ldHRlcy90bWFwLWdldHN0YXJ0ZWQuaHRtbAoqIFRoZSBgdGlkeWNlbnN1c2AgcGFja2FnZTogaHR0cHM6Ly93YWxrZXItZGF0YS5jb20vdGlkeWNlbnN1cy9pbmRleC5odG1sCiogVGhlIGJvb2sgb24gYHRpZHljZW5zdXNgIDogaHR0cHM6Ly93YWxrZXItZGF0YS5jb20vY2Vuc3VzLXIvaW5kZXguaHRtbAoKCiMjIFVzaW5nIHRoZSBOZWlnaGJvcmhvb2QgR2Vvc3BhdGlhbCBEYXRhICh1c2luZyAvZGF0YSkKCgpPdXIgZmlyc3QgZGF0YSBzb3VyY2UgY29tZXMgZnJvbSBvcGVuZGF0YS5kYwoKaHR0cHM6Ly9vcGVuZGF0YS5kYy5nb3YvZGF0YXNldHMvRENHSVM6OmRjLWhlYWx0aC1wbGFubmluZy1uZWlnaGJvcmhvb2RzL2Fib3V0CgoKSSB3aWxsIHVzZSB0aGUgR2VvSlNPTiBmaWxlLiAgKE5ld2VyLCBub3QgbmVjZXNzYXJpbHkgYmV0dGVyLCBidXQgLi4uIGEgc2luZ2xlIGZpbGUuICBOb3Qgc21hbGxlciwgYnV0IC4uLiB0aGlzIG9uZSBpcyBub3QgYmlnLikgIAoKCgoKRGF0YSBpcyBlYXNpbHkgcmVhZGFibGUgCmBgYHtyfQpuZWlnaD1zdF9yZWFkKGhlcmUoIkRDX0hlYWx0aF9QbGFubmluZ19OZWlnaGJvcmhvb2RzLmdlb2pzb24iKSkgJT4lIGNsZWFuX25hbWVzKCkKY2xhc3MobmVpZ2gpCmBgYAoKYGBge3J9CnBsb3QobmVpZ2gpCmBgYAoKCgojIyBSZW1pbmRlciAtIEpvaW5zCgpgYGB7cn0KZGYxPXRpYmJsZShmcnVpdD1jKCJhcHBsZSIsImJhbmFuYSIsImNoZXJyeSIpLGNvc3Q9YygxLjUsMS4yLDIuMjUpKQpkZjI9dGliYmxlKGZydWl0PWMoImFwcGxlIiwiYXBwbGUiLCJjaGVycnkiLCJsZW1vbiIpLAogICAgICAgICAgIGRlc2VydD1jKCJwaWUiLCJjb2JibGVyIiwiY29iYmxlciIsImNoZWVzZWNha2UiKSwKICAgICAgICAgICBjYWw9Yyg0MDAsNDMwLDUwMCw1NTApKQpkZjEKYGBgCmBgYHtyfQpkZjIKYGBgCmBgYHtyfQpsZWZ0X2pvaW4oZGYxLGRmMixieT0iZnJ1aXQiKQpgYGAKCiMjIEludmVzdGlnYXRpbmcgam9pbmluZyBzcGF0aWFsIGFuZCBub24tc3BhdGlhbCBkYXRhCgoKQ292aWQgY2FzZSBpbmZvcm1hdGlvbiBpcyBhdmFpbGFibGUgZnJvbSBvcGVuZGF0YWRjOgoKaHR0cHM6Ly9vcGVuZGF0YS5kYy5nb3YvZGF0YXNldHMvRENHSVM6OmRjLWNvdmlkLTE5LXRvdGFsLXBvc2l0aXZlLWNhc2VzLWJ5LW5laWdoYm9yaG9vZC9hYm91dAoKUmVhZCBjYXNlcyBpbmZvcm1hdGlvbjoKCmBgYHtyIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9CmRmX2M9cmVhZF9jc3YoaGVyZSgiRENfQ09WSUQtMTlfVG90YWxfUG9zaXRpdmVfQ2FzZXNfYnlfTmVpZ2hib3Job29kLmNzdiIpKSAlPiUgY2xlYW5fbmFtZXMoKSAKCmRmX2Nhc2VzPWRmX2MgJT4lCiAgZmlsdGVyKGFzX2RhdGUoZGF0ZV9yZXBvcnRlZCkgPT0gIjIwMjItMDItMjIiKSAlPiUgCiAgc2VwYXJhdGUobmVpZ2hib3Job29kLGludG89YygiY29kZSIsIm5hbWUiKSxzZXAgPSAiOiIpICU+JQogIG11dGF0ZShjb2RlPWNhc2Vfd2hlbigKICAgIGNvZGU9PSJOMzUiIH4iTjAiLAogICAgVFJVRSB+IGNvZGUKICApKSAlPiUKICBzZWxlY3QoLW9iamVjdGlkLC1kYXRlX3JlcG9ydGVkKQoKCmBgYAoKCiMjIFJlZ3VsYXIgam9pbmluZyAob2YgZGF0YWZyYW1lcykKCmBgYHtyfQpuZWlnaDI9bGVmdF9qb2luKG5laWdoLGRmX2Nhc2VzLGJ5PWMoImNvZGUiKSkgCgp0bWFwX21vZGUoInZpZXciKQoKdG1fc2hhcGUobmVpZ2gyKSArdG1fcG9seWdvbnMoInRvdGFsX3Bvc2l0aXZlcyIsYWxwaGE9LjUpCmBgYAoKCiMjIEpvaW5pbmcgd2l0aCBvdGhlciBzcGF0aWFsIGRhdGEKCkxldCdzIGdldCBzb21lIGRhdGEgdXNpbmcgYHRpZHljZW5zdXNgLiAgTmVlZCBhbiBBUEkga2V5ICAgaHR0cHM6Ly9hcGkuY2Vuc3VzLmdvdi9kYXRhL2tleV9zaWdudXAuaHRtbAoKCmBgYHtyfQoKCmNlbnN1c19hcGlfa2V5KCIwMGI3OGNlNTI0NjNiZjM4NmEyNjBkMjNlYzU4ZWRiNjIyZTZkM2FjIikKCiN3aGF0IHZhcmlhYmxlcwp2MjAgPSBsb2FkX3ZhcmlhYmxlcygyMDE4LCJhY3M1IikKIyBtZWRpYW5fZmFtaWx5X2luY29tZT0iCUIwNjAxMV8wMDEiIAojIGFsbCAiQjAwMDAxXzAwMSIJCiNibGFjayAiQjAyMDA5XzAwMSIKYGBgCgoKR2V0IHNvbWUgZGF0YToKCmBgYHtyfQpkZl9jZW5jdXM9Z2V0X2FjcyhnZW9ncmFwaHkgPSAidHJhY3QiLAogICAgICAgICAgICAgICAgICB2YXJpYWJsZXM9YygibWVkaWFuX2luYyI9IkIwNjAxMV8wMDEiLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAicG9wIj0iQjAxMDAxXzAwMSIsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICJwb3BfYmxhY2siPSJCMDIwMDlfMDAxIiksCiAgICAgICAgICAgICAgICAgIHN0YXRlPSJEQyIsZ2VvbWV0cnk9VFJVRSx5ZWFyPTIwMTgpIApgYGAKCmBgYHtyfQpjbGFzcyhkZl9jZW5jdXMpCnBsb3QoZGZfY2VuY3VzKQpgYGAKSXQncyBpbiBsb25nIGZvcm1hdC4gIExldCdzIG1ha2UgaXQgd2lkZS4KYGBge3J9CmRmX2NlbnM9ZGZfY2VuY3VzICU+JSBzZWxlY3QoLW1vZSkgJT4lIHNwcmVhZCh2YXJpYWJsZSxlc3RpbWF0ZSkgCgp0bV9zaGFwZShkZl9jZW5zKSArdG1fcG9seWdvbnMoIm1lZGlhbl9pbmMiLGFscGhhPS41KQpgYGAKCgpgYGB7cn0KCiAgdG1fc2hhcGUobmVpZ2gyKSArdG1fYm9yZGVycyhjb2w9ImJsdWUiLGx3ZD01LGFscGhhPS4yKSsKICB0bV9zaGFwZShkZl9jZW5zKSArdG1fYm9yZGVycyhjb2w9InJlZCIsbHdkPTEsYWxwaGE9LjMpCmBgYAoKCgpgYGB7cn0KIzw8PDw8PDwgSEVBRAojZGZfaj1zdF9qb2luKGRmX2NlbnMsbmVpZ2gyKQojPT09PT09PQojZGZfaj1zdF9qb2luKGRmX2NlbnMsbmVpZ2gyLHByZXBhcmVkPUZBTFNFKQojPj4+Pj4+PiBhYWYwMWJlNWNmNzIxODE5ZGQyZGY2MTVhZWY3YTE5OTliY2VjMGMyCmBgYAoKYGBge3J9CmRmX2NlbnNfYWRqPWRmX2NlbnMgJT4lIHN0X3RyYW5zZm9ybSg0MzI2KQpgYGAKCmBgYHtyfQpkZl9qPXN0X2pvaW4oZGZfY2Vuc19hZGosbmVpZ2gyLGxhcmdlc3Q9VFJVRSkKYGBgCk90aGVyIG9yZGVyPzoKCmBgYHtyfQojPDw8PDw8PCBIRUFECiMjZGZfal9yZXYgPSBzdF9qb2luKG5laWdoMixkZl9jZW5zX2FkaixsYXJnZXN0PVRSVUUpCiM9PT09PT09CiNkZl9qX3JldiA9IHN0X2pvaW4obmVpZ2gyLGRmX2NlbnNfYWRqLGxhcmdlc3Q9VFJVRSkKIz4+Pj4+Pj4gYWFmMDFiZTVjZjcyMTgxOWRkMmRmNjE1YWVmN2ExOTk5YmNlYzBjMgpgYGAKClNpbmNlIHdlIHdhbnQgdGhlIGdlb21ldHJ5IGZvciB0aGUgTkVJR0hCT1JIT09EUywgd2UgbmVlZCBhIHRvIHdvcmsgYSBsaXR0bGUgaGFyZGVyOgoKYGBge3J9CmRmMT1kZl9qICU+JSBzZWxlY3QobWVkaWFuX2luYyxwb3AscG9wX2JsYWNrLG9iamVjdGlkKSAlPiUKICBncm91cF9ieShvYmplY3RpZCkgJT4lCiAgc3VtbWFyaXNlKHBvcF9uPXN1bShwb3ApLAogICAgICAgICAgICBwb3BfYmxhY2tfbj1zdW0ocG9wX2JsYWNrKSwgCiAgICAgICAgICAgIGFkal9tZWRpYW5faW5jb21lPXN1bShwb3AqbWVkaWFuX2luYykvcG9wX24pIAoKcGxvdChkZjEpCmBgYAoKYGBge3J9CiNkZjI9bGVmdF9qb2luKG5laWdoMixkZjEpCgpkZjI9bGVmdF9qb2luKG5laWdoMixkZjEgJT4lIHN0X3NldF9nZW9tZXRyeShOVUxMKSkKCmBgYAoKYGBge3J9CmRmMj1kZjIgJT4lIG11dGF0ZShibGFja19wZXJjPXBvcF9ibGFja19uL3BvcF9uLCBjb3ZpZF9yYXRlPXRvdGFsX3Bvc2l0aXZlcy9wb3BfbikKdG1fc2hhcGUoZGYyKSt0bV9wb2x5Z29ucyhjKCJhZGpfbWVkaWFuX2luY29tZSIsImNvdmlkX3JhdGUiLCJibGFja19wZXJjIikpCmBgYAoKCgpgYGB7cn0KZGYyICU+JSBmaWx0ZXIob2JqZWN0aWQhPTMwKSAlPiUgdG1fc2hhcGUoKSt0bV9wb2x5Z29ucyhjKCJhZGpfbWVkaWFuX2luY29tZSIsImNvdmlkX3JhdGUiLCJibGFja19wZXJjIiksYWxwaGE9LjQpCmBgYAoKCgoKCg==